12:12
2026-06-23
lesswrong.com
large-language-models
Catastrophic Forgetting and Safety Erosion Are Driven by the Same Mechanism and Should Be Monitored by the Same Tools
Researchers have found that catastrophic forgetting and safety erosion in large language models are driven by the same gradient-interference mechanism, suggesting that tools used to monitor and mitigaβ¦